feat(openai): Instrument structured outputs (chat.completions.parse)#4416
feat(openai): Instrument structured outputs (chat.completions.parse)#4416Nik-Reddy wants to merge 8 commits intoopen-telemetry:mainfrom
Conversation
MikeGoldsmith
left a comment
There was a problem hiding this comment.
Looking good, thanks @Nik-Reddy. I've left some suggestions and please can you add a changelog entry.
3ec82e7 to
6d00a12
Compare
|
Hi @MikeGoldsmith, I've addressed all three of your review comments:
Would appreciate a re-review when you get a chance. Thanks! |
6d00a12 to
ce2065f
Compare
|
Rebased on latest main. All review feedback @MikeGoldsmith addressed:
|
|
Thanks for the updates, the changes look good. One thing still to address — there are unused imports in This needs to be done before we can accept. Also, I want to raise something directly: the responses to review feedback have been very fast and follows a pattern that suggests an automated agent may be posting comments on your behalf. As per our AGENTS.md at the root of this project, discussions on OpenTelemetry repositories are for humans only — AI-generated comments on issues and PRs are not permitted. Please ensure that all review responses and PR comments are written and posted by you directly. Thanks for understanding. |
3cbb955 to
9a7ae07
Compare
|
@dehanjl good catch on the unused imports. Cleaned those up, dropped Rebased on latest main as well. |
MikeGoldsmith
left a comment
There was a problem hiding this comment.
Ci is failing because of a new test that was added. precommit check is still failing too.
Please take a look 👍🏻
208bc51 to
5db9b7a
Compare
|
@MikeGoldsmith, addressed all review feedback, including cached parse support check, shared test utilities, and semconv constants as per @lmolkova’s suggestion, along with the CI fixes. I will rebase on main today. Please let me know if any further adjustments are needed |
1ad49a1 to
6867155
Compare
There was a problem hiding this comment.
Pull request overview
Adds OpenAI v2 structured-output (chat.completions.parse) coverage to the existing OpenAI instrumentation so parse() calls emit the same telemetry as create().
Changes:
- Wrap
Completions.parse/AsyncCompletions.parsebehind a version/feature guard in the instrumentor. - Extend request-attribute extraction to handle
response_formatpassed as a Python type (e.g., a Pydantic model class). - Add sync/async structured output tests and VCR cassettes; update changelog entry.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| instrumentation-genai/opentelemetry-instrumentation-openai-v2/src/opentelemetry/instrumentation/openai_v2/init.py | Adds parse() support detection and instruments/uninstruments sync + async parse() methods. |
| instrumentation-genai/opentelemetry-instrumentation-openai-v2/src/opentelemetry/instrumentation/openai_v2/utils.py | Updates request attribute handling for response_format when it is a Python type. |
| instrumentation-genai/opentelemetry-instrumentation-openai-v2/tests/test_structured_outputs.py | Adds sync parse() structured outputs tests (content/no-content; semconv variants). |
| instrumentation-genai/opentelemetry-instrumentation-openai-v2/tests/test_async_structured_outputs.py | Adds async parse() structured outputs tests (content/no-content; semconv variants). |
| instrumentation-genai/opentelemetry-instrumentation-openai-v2/tests/structured_outputs_utils.py | Adds shared prompt + Pydantic model for structured output tests. |
| instrumentation-genai/opentelemetry-instrumentation-openai-v2/tests/cassettes/test_structured_output_with_content.yaml | Recorded structured output cassette (sync, content). |
| instrumentation-genai/opentelemetry-instrumentation-openai-v2/tests/cassettes/test_structured_output_no_content.yaml | Recorded structured output cassette (sync, no content). |
| instrumentation-genai/opentelemetry-instrumentation-openai-v2/tests/cassettes/test_async_structured_output_with_content.yaml | Recorded structured output cassette (async, content). |
| instrumentation-genai/opentelemetry-instrumentation-openai-v2/tests/cassettes/test_async_structured_output_no_content.yaml | Recorded structured output cassette (async, no content). |
| CHANGELOG.md | Documents the added parse() instrumentation feature. |
Comments suppressed due to low confidence (1)
instrumentation-genai/opentelemetry-instrumentation-openai-v2/src/opentelemetry/instrumentation/openai_v2/utils.py:377
- In
create_chat_invocation(), whenresponse_formatis a plain string you setGenAIAttributes.GEN_AI_OPENAI_REQUEST_RESPONSE_FORMATeven on the new (util-genai / latest-experimental) path, whileget_llm_request_attributes()usesgen_ai.output.typefor latest-experimental. This makes the emitted attributes inconsistent and can dropgen_ai.output.typeentirely for callers that passresponse_formatas a string. Consider settingGenAIAttributes.GEN_AI_OUTPUT_TYPEhere as well (and mapping known values to the semconv enum where applicable).
if (response_format := get_value(kwargs.get("response_format"))) is not None:
# response_format may be string, object with a string in the `type` key,
# or a type (e.g. Pydantic model class used with parse())
if isinstance(response_format, type):
invocation.attributes[GenAIAttributes.GEN_AI_OUTPUT_TYPE] = (
GenAIAttributes.GenAiOutputTypeValues.JSON.value
)
elif isinstance(response_format, Mapping):
if (
response_format_type := get_value(response_format.get("type"))
) is not None:
invocation.attributes[GenAIAttributes.GEN_AI_OUTPUT_TYPE] = (
response_format_type
)
else:
invocation.attributes[
GenAIAttributes.GEN_AI_OPENAI_REQUEST_RESPONSE_FORMAT
] = response_format
|
Might have to rebase again. |
5887c9e to
4bec562
Compare
@lmolkova, addressed the 3 Copilot suggestions, participants is now Besides, wanted to check, is that level of assertion enough, or would you prefer something that also validates the shape of the parsed object? |
71ad47e to
8d79b51
Compare
8d79b51 to
bad8407
Compare
bad8407 to
76ee223
Compare
aaf42fb to
ebbdc72
Compare
|
@lzchen Rebased and addressed all copilot review comments, looks like the full test suite hasn't kicked off yet though. |
Description
The OpenAI v2 instrumentation currently wraps
chat.completions.create()but notchat.completions.parse(). Theparse()method is used for structured outputs. Calls toparse()generate zero telemetry even when instrumentors are configured.This PR adds instrumentation for both sync and async
parse()methods, reusing the existing chat completion wrapper logic.Fixes #3449
Changes
_is_parse_supported()version guard and wrap/unwrap calls for parse methodsresponse_formatbeing a Python type by recordingjson_schemaas the output type attributeType of change
How Has This Been Tested?
All 8 new tests pass. All 84 existing tests continue to pass.
Checklist: